How to generate text: using different decoding methods for language generation with Transformers
https://huggingface.co/blog/how-to-generate
Greedy Search
It selects the word with the highest probability as its next word: w_t = argmax_w P(w|w_1:t-1) at each timestamp t
https://huggingface.co/blog/assets/02_how-to-generate/greedy_search.png
生起確率が一番高い次のトークンを逐一選んでいく
(決定的だから)同じ文章を繰り返してしまう
model.generate(**model_inputs, max_new_tokens=40)
Beam search
https://huggingface.co/blog/assets/02_how-to-generate/beam_search.png
直近だけでなく、先の単語まで見て、高い系列となるトークン系列を見つける
ビームサイズ
👉ビームサーチ(Beam Search)を理解する
code:python
beam_output = model.generate(
**model_inputs,
max_new_tokens=40,
num_beams=5,
# no_repeat_ngram_size=2,
early_stopping=True
)
Sampling
ランダム(上2つ違って決定的でない)
sampling means randomly picking the next word wtwt​ according to its conditional probability distribution
code:python
model.generate(
**model_inputs,
max_new_tokens=40,
do_sample=True,
top_k=0
)
Top-K Sampling
上位k個の生成確率のトークンから
In Top-K sampling, the K most likely next words are filtered and the probability mass is redistributed among only those K next words.
図を見るとK位なので、確率が低いものも入ってきている
code:python
model.generate(
**model_inputs,
max_new_tokens=40,
do_sample=True,
top_k=50
)
Top-p (nucleus) sampling
上位p%の生成確率のトークンから
in Top-p sampling chooses from the smallest possible set of words whose cumulative probability exceeds the probability p
code:python
model.generate(
**model_inputs,
max_new_tokens=40,
do_sample=True,
top_p=0.92,
top_k=0
)
temperature
0以上の実数をsoftmax手前のlogitsの分母に掛ける
大きいと一様分布に近づく
👉
IMO:合わせてText generation strategies